Model-based object classification and target recognition

ABSTRACT

A method for at least one of model-based classification and target recognition of an object. The method further includes recording an image of an object and determining a feature that represents a part of the object. Moreover, the method includes determining at least one condition associated with the feature that indicates an applicability of the feature based on at least one of: covering by parts of the object, a current illumination situation, time the image is recorded, movement information of the object, movement information of other objects, and a position and orientation of the object. The instant abstract is neither intended to define the invention disclosed in this specification nor intended to limit the scope of the invention in any way.

[0001] The present invention relates in general to a model-based objectclassification and target recognition and in particular to a structureand the execution of models for object classification and localization.

[0002] All previously known methods from the prior art which useexplicit geometry models for matching extract only few features at thesame time from the input data. There are several reasons for this.

[0003] For one thing, it is difficult to fuse different features so thatidentical benchmark values have an identical meaning. For another, thereare purely practical reasons that will be explained in more detail inthe two following sections.

[0004] Furthermore, the rules of when a feature of a model is to bechecked, are either just as firmly programmed in as the feature itselfor they are determined from the geometry of the object.

[0005] The previously known systems, thus also those of D. G. Lowe inFitting Parametrized Three-Dimensional Models to Images, IEEE Transact.on Pattern Analysis and Machine Intelligence, Vol. 13. No. 5, 1991,those of L. Stephan et al. in Portable, scalable architecture formodel-based FLIR ATR and SAR/FLIR fusion, Proc. of SPIE, Vol. 0.3718,Automatic Target Recognition IX, August 1999 and those described inEP-A-622 750 have in general a fixed arrangement of the image processingand in particular a fixed arrangement of the preprocessing.

[0006] According to these known systems, first of all the image is readin, then it is preprocessed and subsequently the matching is carriedout. This means in the known systems that either all preprocessing whoseresults are contained in any model has to be carried out or firmlyimplemented tests have to be carried out that avoid this.

[0007] An object of the present invention is therefore to make availablea method for object classification and target recognition whichminimizes the necessary computer resources and yet at the same time ismore robust.

[0008] Another object of the present invention is to make available amethod for object classification and target recognition which minimizesthe number of preprocessing steps.

[0009] These objects and other objects to be taken from thespecification and figures below are attained by a method according tothe attached claims.

[0010] Exemplary embodiments of the invention will be explained in moredetail on the basis of a drawing. They show:

[0011]FIG. 1 The sequence of operations of object recognition at thehighest level;

[0012]FIG. 2 The detailed sequence of operations of the matching blockof FIG. 1;

[0013]FIG. 3 An image acquired in the image creation block of FIG. 1;

[0014]FIG. 4 A region (ROI) enclosing the sought objects, which regioncomprises a rectangular partial section of the image of FIG. 3; and

[0015]FIG. 5a through 5 e How the feature request works on the basis ofthe example of the edge receptor.

[0016] The present invention is based on the knowledge that certainfeatures are visible only from special views. Thus, e.g., the windows ofthe cargo hold doors of helicopters are visible only from the side, butnot from other angles of view. This applies analogously to theillumination conditions that permit the recognition of cargo hold doorsor of other elements of helicopters (such as, e.g., wheels, liftingload, etc.) only under certain light conditions. Therefore, according tothe present invention at least one feature to be recognized is linked toat least one condition or at least one rule. Of course, it is possibleto link a plurality of features to respective specific conditions and/orto associate several conditions with a single feature to be recognized.Under these conditions only those features would thus have to beextracted from the image with which the respective linked condition ismet. In other words, no object classification and/or target recognitionneeds to be carried out for a cargo hold door that cannot be visible atall according to the position of the helicopter with reference to acamera.

[0017] According to the invention, the possibility was found ofdepositing various features (e.g., edges, area circumferences, hotspots) in the model in a simple and consistent manner and of carryingout the extraction of these features in an effective manner.

[0018] If further features are to be extracted in the known imageprocessing systems according to the prior art cited above, their calls,including parameter transfer, have to be explicitly programmed for eachapplication or each model. This can be more or less expensive, dependingon the system. This rigid sequence comprising the creation of an image,the segmentation of the created image and the preprocessing of the imagerecorded through the segmentation is known from EP-A-622 750.

[0019] In accordance with the present invention, each feature that is tobe recognized is provided with a condition that establishes itsapplicability. The algorithm of this condition can be freely programmedas desired and is not restricted only to the geometry of the object. Thecondition can also examine, e.g., the distance of the object to berecognized from the camera, the illumination conditions (e.g.,contrast), speed, height, relative position, etc.

[0020] By considering one or more of the conditions, the superfluouswork caused by “non-visibility” or “non-recordability” of a feature isavoided and the method according to the invention is at the same timemade more robust, since missing features do not lead to a worseassessment of the model.

[0021] According to a further particularly preferred aspect of thepresent invention, each feature that meets a condition and is thusrequired in a preprocessing of a partial step of the image processing,is requested by this partial step. The sequence of the preprocessing aswell as the algorithm of the partial step are thereby deposited in themodel (e.g., as the number of a function in a list of availablefunctions). The superfluous work in a rigid arrangement of imagecreation, preprocessing and classification/localization, is thusavoided.

[0022] Since different partial steps may possibly need the same features(e.g., the left edge and right edge features of an object require the“edge image” preprocessing) or partial results of lower preprocessingrepresent inputs for higher preprocessing (e.g., edge image and waveletsegmentation of the filtered original image, with the aid of which thelocal characteristics of a function can be studied efficiently by meansof local wavelet bases), all reusable preprocessing steps are stored inthe sequence of the compilation, beginning with the original image. If aspecific preprocessing is required, a “request” for this preprocessingwith all preceding steps of this preprocessing, beginning with theoriginal, is carried out through the image processing.

[0023] The treatment of the request lies in carrying out thepreprocessing and depositing and making available the result or, ifalready present, making available the deposited result, without carryingout a new calculation. As already mentioned, existing preprocessing orpreprocessing series can thus be quickly called from an intermediatememory (cache). If, e.g., the preprocessing 1 is carried out for afeature A, and if preprocessing 1, 2 and 3 are necessary for a furtherfeature B, the preprocessing 1 of the feature 1 according to theinvention in intermediate storage can thus be accessed, which means theprocessing time is reduced.

[0024] With these steps it is possible to extract all the featuresnecessary for the recognition of an object (after a correspondingnormalization) and to feed them to the recognition process. One istherefore no longer restricted to a small number of features for reasonsof speed or maintenance. Of course, the preprocessing of the systemaccording to the invention also takes time for calculation, but onlycalculations that are absolutely necessary are carried out, since eachpreprocessing is to be carried out only once. Different features canthus be extracted as long as the total time of all preprocessing doesnot exceed the maximum run time.

[0025] The method for preprocessing described above can be implementedaccording to the invention regardless of the fact that certain featuresare only visible from special views. In other words, the presentpreprocessing can be carried out independently of the link to one of thecertain conditions, although the combination of the two features has aparticularly advantageous effect with reference to the computerresources and the robustness of the system.

[0026] The method for preprocessing according to the invention isparticularly advantageous compared to the prior art. The methodpresented by D. G. Lowe in Fitting Parametrized Three-Dimensional Modelsto Images, IEEE Transact. on Pattern Analysis and Machine Intelligence,Vol. 13, No. 5, 1991, recognizes the sought objects on the basis ofedges. These edges are expressed as parametrized curves and the freeparameters (spatial position and internal degrees of freedom) aredetermined through an approximation method. The method is relevant inthat it deposits geometric preprocessing in a cache. However, the cacheof the known method of Lowe relates only to visibility conditions,whereas the cache or intermediate memory according to the invention isnot limited in the type of preprocessing. Likewise the visibilityconditions are determined only from the geometry of the object and arenot freely selectable. Otherwise the method of Lowe is a typicalrepresentative of methods with firmly implemented preprocessing.

[0027] The method according to L. Stephan et al. (Portable, scalablearchitecture for model-based FLIR ATR and SAR/FLIR fusion, Proc. ofSPIE, Vol. 3718, Automatic Target Recognition IX, August 1999) extractsfeatures not specified in detail from radar images (SAR) and extractsedges from the infrared images (FLIR images). A separate hypothesisformation is carried out with each of these features and finally thesehypotheses are fused. The entire preprocessing is implemented in a fixedsequence in the system; only the geometry models to be found areinterchangeable. The precise type and sequence of the preprocessing isgiven in EP-A-622 750.

[0028] A currently particularly preferred exemplary embodiment of theinvention will now be explained with reference to the accompanying FIGS.1 through 5e. This exemplary embodiment can be modified in a manner wellknown to one skilled in the art, and it is by no means intended torestrict the scope of protection of the invention to the example below.Rather the scope of protection is determined by the features of theclaims and their equivalents.

[0029]FIG. 1 shows a sequence of operations of the object recognition atthe highest level. In step 1 acquiring the image with a camera, loadinga stored image or producing a VR image takes place in the image creationblock. An image acquired in the image creation block of FIG. 1 is shownby way of example in FIG. 3.

[0030] In step 2 (ROI creation) a simple and quick rough detection ofthe object in the image takes place, i.e., a rectangular region thatmost nearly encloses the sought objects is positioned. The abbreviationROI (region of interest) denotes this region enclosing the soughtobjects which can be seen with reference to FIG. 4. Methods fordetermining such an ROI are known per se. These include threshold valuemethods, pixel classification, etc. An assignment of the currentlyformed ROI to an ROI from the last image must also be made.

[0031] In step 3 a decision is made on whether the object in the regionof interest was provided with an ROI for the first time or not. Thisstep is necessary, since no hypotheses to be tested yet exist that areassigned to the ROI and so no test of the hypotheses can take place. Ifthe decision in step 3 is “yes,” the hypothesis initialization takesplace in step 4. Here the assignment of one or more 7-tuples to an ROIis carried out. The 7-tuple comprises the type of object (e.g., modelnumber (in the case of a helicopter I=Hind, 2=Helix, 3=Bell Ranger,etc.)) and the estimated six degrees of freedom under the assumption ofthis model class. The initial compilation of the six degrees of freedomcan be made, e.g., through systematic testing.

[0032] If the decision in step 3 is “no,” the hypotheses update iscarried out in step 5. In the event of an already existing hypothesis,the new position created by the movement of the object in space has tobe matched to the position of the object in the image. To this end amovement prediction known in the prior art is carried out by means of atracker (e.g., Kalman filter).

[0033] The matching described in detail with reference to FIG. 2 takesplace in step 5 of FIG. 1.

[0034] The 2D-3D pose estimate is implemented in step 6 of FIG. 1. Thechange of position of the object in space can be estimated from thechange of position of the receptors and the assumed position of thereceptors in space (from hypothesis) by means of the 2D-3D poseestimate. Methods for this are known in the prior art (cf., e.g.,Haralick: Pose Estimation from Corresponding Point Data, IEEETransactions on Systems, Man and Cybernetics, Vol. 19, No. 6,November/December 1989).

[0035] The quality of the model is determined in step 7 (“better” block)of FIG. 1. This is necessary since the matching violates the rigidityproperty of the object. The rigidity is guaranteed through the poseestimation and new projection, since errors of individual receptors areaveraged and a single pose (6 degrees of freedom) is generated for allreceptors. A further matching in the same image is useful in order toachieve the best possible result here, i.e., the smallest possible errorbetween hypothesis and image. With a deterioration (or very slightimprovement), it is thereby assumed that the optimum point has alreadybeen reached.

[0036] The evaluation of all hypotheses, in particular their qualityvalues, of an ROI takes place in step 8 of FIG. 1 (“classification”block). The classification produces either the decision for a certainclass and pose (by selection or combination of pose values of differenthypotheses) or the information that the object cannot be assigned to anyknown class.

[0037] The evaluation of class, quality and orientation takes place instep 9 of FIG. 1. The information from the classification can bedisplayed to the user in different ways (e.g., position and class asoverlay in the image) or actions can be directly derived therefrom(e.g., triggering a weapon). This can be determined after each image orat greater, regular intervals or when specific quality thresholds areexceeded or fallen below or the classification.

[0038] The details of the adjustment (matching) are explained withreference to FIG. 2.

[0039] The examination of rules takes place in step 10 of FIG. 2. Therule of each receptor is evaluated and incorporated into the 2Drepresentation (graph) or not on the basis of the result of thereceptor. Since various rules can exist for various applications, whichrules also process any desired information to produce the rule result,how the method operates is described here using the example of ageometrically motivated rule function. It should be noted that theparameters of the rule function must take into account not only thegeometry of the object and its current pose. Other information (e.g.,position of the sun, horizon line, friend/foe positions, radio beacons,time of day), as available, can also contribute to the rule result.

[0040] The rule function of the vector angle rule contains threeparameters that are stored in the model:

[0041] a, b and x. Their result is r.

[0042] The rule function itself has the following form: $\begin{matrix}{{\cos \quad \beta} = \frac{\langle{{\underset{\underset{\_}{\_}}{R}\quad \underset{\_}{x}},{- \underset{\_}{z}}}\rangle}{{{\underset{\underset{\_}{\_}}{R}\quad \underset{\_}{x}}}{{- \underset{\_}{z}}}}} \\{r = \{ \begin{matrix}{1} & {{\beta \prec a}} & \quad \\{{1 - \frac{\beta - a}{b}}} & {{a \leq \beta \leq}} & {a + b} \\{0} & {{\beta \succ}} & {a + b}\end{matrix} }\end{matrix}$

[0043] The vector z is the unit vector in direction z (view direction ofthe camera). The matrix R is the rotation matrix from the hypothesisthat rotates the model from its original position (parallel to thecamera coordinates system) into its current view. x is a vector thatdescribes the center view direction from the object outwards (e.g., theoutside normal of a surface).

[0044] If r produces a value different from 0, the receptor isincorporated into the 2D representation. The values between 0 and 1 areavailable for further evaluation but are not currently in use.

[0045] The projection of the receptors is carried out in step 11 of FIG.2.

[0046] Step 11 is carried out separately (and possibly in a parallelmanner) for each receptor that is included in the graph through thetest. The receptor reference point p ³is thereby first projected intothe image matrix as p ².

p ² =P ( R p ³ +t)

[0047] Matrix R is the above-mentioned rotation matrix, t is the vectorfrom the beginning of the camera coordinate system to the beginning ofthe model coordinate system in the scene (translation vector). Matrix Pis the projection matrix or camera model:$\underset{\underset{\_}{\_}}{P} = \begin{bmatrix}{fs}_{x} & 0 & 0 \\0 & {fs}_{y} & 0 \\0 & 0 & 1\end{bmatrix}$

[0048] f is thereby the focal length of the camera, f_(sx) and f_(sy)the resolution of the camera in pixels pro mm. p ² is a homogenousvector (u, v and scaling) in pixels relative to the camera perspectivecenter. This is converted accordingly into the pixel coordinates x andy.

[0049] Subsequently the projection function of the receptor is called,which function projects the receptor-specific data. An example of thisis an edge receptor, the beginning and end points of which are definedin 3D on the model and are projected into the image matrix through thisfunction in the same way as the reference point.

[0050] The storage of the 3D points takes place in step 12. A list ofhypotheses points is created in 3D, whereby one or more points perreceptor are stored in a defined sequence. The receptor reference pointof each receptor can always be found in the list, further points areoptional. In addition the edge receptor stores the beginning and endpoints.

[0051] The graph creation is implemented in step 13. A graph is createdthrough tessellation from the mass of the points projected into theimage matrix, if it is necessary for the following matching process. Themethod used is known and described in the following article: Watson, D.F., 1981, Computing the n-dimensional Delaunay tessellation withapplication to Voronoi polytopes: The Computer J., 24(2), p. 167-172.

[0052] The 2D matching is carried out in step 14, whereby either theelastic graph matching method according to Prof. v.d. Malsburg iscarried out or another method with similar objective. A method of thistype was implemented by us that features special properties that areconnected to the tracking of the object. Through the method the bestpossible position of the sought feature has to be found near the startposition, whereby a trade-off between feature quality and deviation fromthe given graph configuration is desirable. In this step it is thereforenecessary to carry out some kind of scanning of the image with theapplication function of the receptor. The match quality of theapplication function is assigned to each scanned position so that themost favorable position can be determined.

[0053] It will now be shown how the feature request works using theexample of the edge receptor. To this end its algorithm is given aspseudocode:

[0054] req=root of the preprocessing tree (5.a)

[0055] req=request(req.edgeimage,threshold=10,sigma=1) (5.b)

[0056] req=request(req,distanceimage,maximumdistance=100) (5.c)

[0057] image=imagefromtree(req) (5.d)

[0058] certainchamferdistancealongtheline(image,line) (5.e)

[0059] From the image creation (block 1) up to the beginning of 5 b, thepreprocessing cache is occupied only with the original image.

[0060] According to the pseudocode 5 a (see FIG. 5.a), the indicator reqis placed on the root of the tree.

[0061] In the request (5.b) (cf. FIG. 5b) it is established that thereare as yet no nodes of the edge image type with the above-mentionedparameters. This is then produced by means of the registered routine forcalculating an edge image.

[0062] (5.c) produces the distance image in the same way (cf. FIG. 5c).

[0063] (5.d) reads out the image from req and (5.e) calculates thequality of the feature in that it determines the average distance (inpixels) from an image edge. To this end the values are taken directlyfrom the edge image. To this end reference is made to FIGS. 5d and 5 e.

[0064] In estimating the next position, the tree iterator (req) in (5.1)is re-placed at the root and in (5.b) and (5.c) it is moved on withoutcalculation.

[0065] Other receptors that are deposited in the model can expand thistree further, as the free space on the right side of FIG. 5e is intendedto indicate.

[0066] The storage of the 2D points takes place in step 15 of FIG. 2.The points p ² according to the matching step are deposited in a list inthe same sequences as in (12). It should thereby be ensured that thesynchronicity of both lists is still guaranteed in order to avoid anyinconsistencies in matching.

1-9. (canceled)
 10. A method for at least one of model-basedclassification and target recognition of an object, the methodcomprising: recording an image of an object; determining a feature thatrepresents a part of the object; determining at least one conditionassociated with the feature that indicates an applicability of thefeature based on at least one of: covering by parts of the object, acurrent illumination situation, time the image is recorded, movementinformation of the object, movement information of other objects, and aposition and orientation of the object; and carrying out the at leastone of classification and target recognition of the object by recordingthe feature when the at least one condition indicates the applicabilityof the feature, wherein the position and orientation of the object arebased upon at least one of an image-recording device, a technical devicecarrying the image-recording device, objects classified and localizedwith the present method, objects classified or localized with othermethods, and fixed facilities.
 11. The method according to claim 10,wherein the determining of the feature that represents a part of theobject comprises determining a plurality of features, wherein thedetermining of the at least one condition comprises determining at leastone condition for each of the plurality of features, and wherein thecarrying out the at least one classification and target recognition ofthe object comprises at least one of classifying and target recognizingof the object through the detection of the plurality of features. 12.The method according to claim 10, wherein the determining of the featurethat represents a part of the object comprises determining a pluralityof features.
 13. The method according to claim 12, wherein thedetermining of at least one condition comprises determining at least onecondition for each of the plurality of features.
 14. The methodaccording to claim 12, wherein the carrying out of the at least oneclassification and target recognition of the object comprises at leastone of classifying and target recognizing of the object through thedetection of the plurality of features.
 15. The method according toclaim 10, wherein a programmable algorithm is associated with the atleast one condition and the method further comprises programming thealgorithm as desired.
 16. The method according to claim 10, wherein theat least one condition comprises one of: geometry of the object,distance of the object from a camera, illumination conditions, contrast,speed of the object, height of the object, and relative position of theobject to the camera.
 17. The method according to claim 10 furthercomprising: preprocessing for the detection of a specific feature;testing, before the preprocessing for the detection of the specificfeature, whether the preprocessing for the detection of the specificfeature has been carried out in connection with another feature; andusing, when preprocessing for the detection of the specific feature hasbeen carried out for the another feature, the preprocessing of theanother feature as the preprocessing for the detection of the specificfeature.
 18. The method according to claim 17 further comprising:storing the preprocessing in a cache memory.
 19. The method according toclaim 17, wherein the specific feature is one of a left edge and rightedge of an object and the preprocessing of each of these featurescomprises edge image preprocessing.
 20. The method according to claim 10further comprising: storing all reusable preprocessing as a sequence ofcompilation.
 21. The method according to claim 18, wherein the cache isnot restricted to a type of preprocessing.
 22. A method for at least oneof model-based classification and target recognition of an object, themethod comprising: recording an image of an object; determining afeature that represents a part of the object; determining at least onecondition associated with the feature that indicates an applicability ofthe feature based on at least one of: covering by parts of the object, acurrent illumination situation, time the image is recorded, movementinformation of the object, movement information of other objects, and aposition and orientation of the object; and carrying out the at leastone classification and target recognition of the object by recording thefeature when the condition indicates the applicability of the feature,wherein the condition is one of geometry of the object, distance of theobject from a camera, illumination conditions, contrast, speed of theobject, height of the object, and relative position of the object to thecamera.
 23. The method according to claim 22, wherein the determining ofthe feature that represents a part of the object comprises determining aplurality of features, wherein the determining of the at least onecondition comprises determining at least one condition for each of theplurality of features, and wherein the carrying out the at least oneclassification and target recognition of the object comprises at leastone of classifying and target recognizing of the object through thedetection of the plurality of features.
 24. (New) The method accordingto claim 22, wherein the determining of the feature that represents apart of the object comprises determining a plurality of features. 25.The method according to claim 24, wherein the determining of at leastone condition comprises determining at least one condition for each ofthe plurality of features.
 26. The method according to claim 24, whereinthe carrying out of the at least one classification and targetrecognition of the object comprises at least one of classifying andtarget recognizing of the object through the detection of the pluralityof features.
 27. The method according to claim 22, wherein aprogrammable algorithm is associated with the at least one condition andthe method further comprises programming the algorithm as desired. 28.The method according to claim 22 further comprising: preprocessing forthe detection of a specific feature; testing, before the preprocessingfor the detection of the specific feature, whether the preprocessing forthe detection of the specific feature has been carried out in connectionwith another feature; and using, when preprocessing for the detection ofthe specific feature has been carried out for the another feature, thepreprocessing of the another feature as the preprocessing for thedetection of the specific feature.
 29. The method according to claim 28further comprising: storing the preprocessing in a cache memory.
 30. Themethod according to claim 28, wherein the specific feature is one of aleft edge and right edge of an object and the preprocessing of each ofthese features comprises edge image preprocessing.
 31. The methodaccording to claim 22 further comprising: storing all reusablepreprocessing as a sequence of compilation.
 32. The method according toclaim 29, wherein the cache is not restricted to a type ofpreprocessing.